24 research outputs found

    Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs

    Get PDF
    Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution. We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all

    Analytical/ML Mixed Approach for Concurrency Regulation in Software Transactional Memory

    Get PDF
    In this article we exploit a combination of analytical and Machine Learning (ML) techniques in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM). Our mixed approach has the advantage of reducing the training time of pure machine learning methods, and avoiding approximation errors typically affecting pure analytical approaches. Hence it allows very fast construction of highly reliable performance models, which can be promptly and effectively exploited for optimizing actual application runs. We also present a real implementation of a concurrency regulation architecture, based on the mixed modeling approach, which has been integrated with the open source Tiny STM package, together with experimental data related to runs of applications taken from the STAMP benchmark suite demonstrating the effectiveness of our proposal. © 2014 IEEE

    Providing Transaction Class-Based QoS in In-Memory Data Grids via Machine Learning

    Get PDF
    Elastic architectures and the ”pay-as-you-go” resource pricing model offered by many cloud infrastructure providers may seem the right choice for companies dealing with data centric applications characterized by high variable workload. In such a context, in-memory transactional data grids have demonstrated to be particularly suited for exploiting advantages provided by elastic computing platforms, mainly thanks to their ability to be dynamically (re-)sized and tuned. Anyway, when specific QoS requirements have to be met, this kind of architectures have revealed to be complex to be managed by humans. Particularly, their management is a very complex task without the stand of mechanisms supporting run-time automatic sizing/tuning of the data platform and the underlying (virtual) hardware resources provided by the cloud. In this paper, we present a neural network-based architecture where the system is constantly and automatically re-configured, particularly in terms of computing resources

    Adaptive Transactional Memories: Performance and Energy Consumption Tradeoffs

    No full text
    Energy efficiency is becoming a pressing issue, especially in large data centers where it entails, at the same time, a non-negligible management cost, an enhancement of hardware fault probability, and a significant environmental footprint. In this paper, we study how Software Transactional Memories (STM) can provide benefits on both power saving and the overall applications’ execution performance. This is related to the fact that encapsulating shared-data accesses within transactions gives the freedom to the STM middleware to both ensure consistency and reduce the actual data contention, the latter having been shown to affect the overall power needed to complete the application’s execution. We have selected a set of self-adaptive extensions to existing STM middlewares (namely, TinySTM and R-STM) to prove how self-adapting computation can capture the actual degree of parallelism and/or logical contention on shared data in a better way, enhancing even more the intrinsic benefits provided by STM. Of course, this benefit comes at a cost, which is the actual execution time required by the proposed approaches to precisely tune the execution parameters for reducing power consumption and enhancing execution performance. Nevertheless, the results hereby provided show that adaptivity is a strictly necessary requirement to reduce energy consumption in STM systems: Without it, it is not possible to reach any acceptable level of energy efficiency at all

    Automatic tuning of the parallelism degree in hardware transactional memory

    No full text
    Transactional Memory (TM) is an emerging paradigm that promises to ease the development of parallel applications. Due to its inherently speculative nature, however, TM can suffer of performance degradations in presence of conflict intensive workloads.A key technique to tackle this issue consists in dynamically regulating the number of concurrent threads, which allows for selecting the concurrency level that best fits the intrinsic parallelism of specific applications. In this area, several self-tuning approaches have been proposed for Software-based implementations of TM (STM). In this paper we investigate the effectiveness of these techniques when applied to Hardware TM (HTM), a theme that is particularly relevant and timely given the recent integration of hardware supports for TM in next generation of mainstream Intel processors. Our study, conducted on Intel's implementation of HTM, identifies several issues associated with the employment of techniques originally conceived for STM. Motivated by these findings, we propose an innovative machine learning based technique explicitly designed to take into account peculiarities of HTM systems, and demonstrate its advantages, in terms of higher accuracy and shorter learning times, using the STAMP benchmark suite. © 2014 Springer International Publishing Switzerland

    Analytical/ml mixed approach for concurrency regulation in software transactional memory

    No full text
    Abstract-In this article we exploit a combination of analytical and Machine Learning (ML) techniques in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM). Our mixed approach has the advantage of reducing the training time of pure machine learning methods, and avoiding approximation errors typically affecting pure analytical approaches. Hence it allows very fast construction of highly reliable performance models, which can be promptly and effectively exploited for optimizing actual application runs. We also present a real implementation of a concurrency regulation architecture, based on the mixed modeling approach, which has been integrated with the open source TinySTM package, together with experimental data related to runs of applications taken from the STAMP benchmark suite demonstrating the effectiveness of our proposal

    APART: Low cost active replication for multi-tier data acquisition systems

    No full text
    This paper proposes APART (A Posteriori Active ReplicaTion), a novel active replication protocol specifically tailored for multi-tier data acquisition systems. Unlike existing active replication solutions, APART does not rely on a-priori coordination schemes determining a same schedule of events across all the replicas, but it ensures replicas consistency by means of an a-posteriori reconciliation phase. The latter is triggered only in case the replicated servers externalize their state by producing an output event towards a different tier. On one hand, this allows coping with non-deterministic replicas, unlike existing active replication approaches. On the other hand, it allows attaining striking performance gains in the case of silent replicated servers, which only sporadically, yet unpredictably, produce output events in response to the receipt of a (possibly large) volume of input messages. This is a common scenario in data acquisition systems, where sink processes, which filter and/or correlate incoming sensor data, produce output messages only if some application relevant event is detected. Further, the APART replica reconciliation scheme is extremely lightweight as it exploits the cross-tier communication pattern spontaneously induced by the application logic to avoid explicit replicas coordination messages. © 2008 IEEE

    Auto-tuning of Cloud-based In-memory Transactional Data Grids via Machine Learning

    No full text
    In-memory transactional data grids have revealed extremely suited for cloud based environments, given that they well fit elasticity requirements imposed by the pay-as-you-go cost model. Particularly, the non-reliance on stable storage devices simplifies dynamic resize of these platforms, which typically only involves setting up (or shutting down) some data-cache instance. On the other hand, defining the well suited amount of cache servers to be deployed, and the degree of replication of slices of data, in order to optimize reliability/availability and performance tradeoffs, is far from being a trivial task. As a example, scaling up/down the size of the underlying infrastructure might give rise to scarcely predictable secondary effects on the side of the synchronization protocol adopted to guarantee data consistency while supporting transactional accesses. In this paper we investigate on the usage of machine learning approaches with the aim at providing a means for automatically tuning the data grid configuration, which is achieved via dynamic selection of both the well suited amount of cache servers, and the well suited degree of replication of the data-objects. The final target is to determine configurations that are able to guarantee specific throughput or latency values (such as those established by some SLA), under some specific workload profile/intensity, while minimizing at the same time the cost for the cloud infrastructure. Our proposal has been integrated within an operating environment relying on the well known Infinispan data grid, namely a mainstream open source product by the Red Had JBoss division. Some experimental data are also provided supporting the effectiveness of our proposal, which have been achieved by deploying the data platform on top of Amazon EC2. © 2012 IEEE

    Machine learning-based thread-parallelism regulation in software transactional memory

    No full text
    Transactional Memory (TM) stands as a powerful paradigm for manipulating shared data in concurrent applications. It avoids the drawbacks of coarse grain locking schemes, namely the potentially excessive limitation of concurrency, while jointly providing support for synchronization transparency to the programmers, which is achieved by embedding code-blocks accessing shared data within transactions. On the downside, excessive transaction aborts may arise in scenarios with non-negligible volumes of conflicting data accesses, which might significantly impair performance. TM needs therefore to resort to methods enabling applications to run with the maximum degree of transaction concurrency that still avoids thrashing. In this article, we focus on Software TM (STM) implementations and present a machine learning-based approach that enables the dynamic selection of the best suited number of threads to be kept alive along specific phases of the execution of STM applications, depending on (variations of) the shared data access pattern. Two key contributions are provided with our approach: (i) the identification of the well suited set of features allowing the instantiation of a reliable neural network-based performance model and (ii) the introduction of mechanisms enabling the reduction of the run-time overhead for sampling these features. We integrated a real implementation of our machine learning-based thread-parallelism regulation approach within the TinySTM open source package and present experimental data, based on the STAMP benchmark suite, which show the effectiveness of the presented thread-parallelism regulation policy in optimizing transaction throughput

    Dynamic feature selection for machine-learning based concurrency regulation in STM

    No full text
    In this paper we explore machine-learning approaches for dynamically selecting the well suited amount of concurrent threads in applications relying on Software Transactional Memory (STM). Specifically, we present a solution that dynamically shrinks or enlarges the set of input features to be exploited by the machine-learner. This allows for tuning the concurrency level while also minimizing the overhead for input-features sampling, given that the cardinality of the input-feature set is always tuned to the minimum value that still guarantees reliability of workload characterization. We also present a fully heedged implementation of our proposal within the TinySTM open source framework, and provide the results of an experimental study relying on the STAMP benchmark suite, which show significant reduction of the response time with respect to proposals based on static feature selection. © 2014 IEEE.In this paper we explore machine-learning approaches for dynamically selecting the well suited amount of concurrent threads in applications relying on Software Transactional Memory (STM). Specifically, we present a solution that dynamically shrinks or enlarges the set of input features to be exploited by the machine-learner. This allows for tuning the concurrency level while also minimizing the overhead for input-features sampling, given that the cardinality of the input feature set is always tuned to the minimum value that still guarantees reliability of workload characterization. We also present a fully fledged implementation of our proposal within the TinySTM open source framework, and provide the results of an experimental study relying on the STAMP benchmark suite, which show significant reduction of the response time with respect to proposals based on static feature selection
    corecore